Maintainer: Jianhai Zhang
spatialHeatmap PackageThe spatialHeatmap package provides functionalities for visualizing cell-, tissue- and organ-specific data of biological assays by coloring the corresponding spatial features defined in anatomical images according to a numeric color key. The color scheme used to represent the assay values can be customized by the user. This core functionality is called a spatial heatmap (SHM) plot. It is enhanced with nearest neighbor visualization tools for groups of measured items (e.g. gene modules) sharing related abundance profiles, including matrix heatmaps combined with hierarchical clustering dendrograms and network representations.
Related software tools for biological applications in this field are largely based on pure web applications (Winter et al. 2007; Waese et al. 2017) or local tools (Maag 2018; Muschelli, Sweeney, and Crainiceanu 2014) that typically lack customization functionalities. These restrictions limit users to utilizing pre-existing expression data and/or fixed sets of anatomical image collections. To close this gap for biological use cases, we have developed spatialHeatmap as a generic R/Bioconductor package for plotting quantitative values onto any type of spatially mapped images in a programmable environment and/or in an intuitive to use graphical user interface (GUI) application. For details, refer to the package vignette.
To plot SHMs, a pair of formatted data and aSVG (see aSVG below) file are required. The latter can come from the aSVG repository or created by users. If the target aSVG file is not available in this repository, users should make a custom one. This tutorial explains the detailed process of making aSVG files. To reproduce the results in this tutorial, all the used files are available to download.
This tutorial covers the following aspects: the aSVG repository, three ways of making SVG shapes in Inkscape, the requirements on data format, instruction on formatting aSVG files, and simple example of making SHMs on the aSVG created in this tutorial. In the Supplement, more details on the numeric data objects and how to convert aSVGs from SHM format to EBI format are presented.
The spatialHeatmap package should be installed from an R (version \(\ge\) 3.6) session with the BiocManager::install command.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("spatialHeatmap")
Next, the packages required for running the sample code in this vignette need to be loaded.
library(spatialHeatmap); library(SummarizedExperiment); library(GEOquery)
The following lists the vignette(s) of this package in an HTML browser. Clicking the corresponding name will open this vignette.
browseVignettes('spatialHeatmap')
To assign colors to specific features in SHMs, annotated SVG (aSVG) files are used where the shapes of interest are labeled according to certain conventions so that they can be addressed and colored programmatically. An aSVG repository, that can be used by spatialHeatmap directly, has been generated by the EBI Gene Expression Group. It contains annatomical aSVG images from different species. These SVGs are also used by the Expression Atlas database. In addition, the spatialHeatmap has its own repository called spatialHeatmap aSVG Repository, where some aSVG files developed in this project are already deposited (e.g. Figure 1).
If users cannot find a target aSVG in the two repositories, this step-by-step SVG tutorial for creating custom aSVG images is recommended. The BAR eFP browser at University of Toronto contains many anotomical images, and these images are good templates for making custom aSVGs.
We will add more aSVGs to our repository in the future and users are welcome to deposit their own aSVGs there to share with other spatialHeatmap users.
To make SVG images, a PNG image with defined tissues and the SVG editor Inkscape are required. The image editor GIMP can be used if the tissue outlines in the PNG image are clear. Inkscape is used to draw the SVG image with the PNG image as a template, and annotate the SVG image in accordance with the data. The values in data are used to color different tissues in spatial heatmaps. GIMP could be used to automatically extract shapes for the SVG image.
There are 3 different options to make SVG images: Draw Over Template Shapes, Use Regular Shapes, Use GIMP. If tissues in the template image have unclear outlines, the first 2 options have to be used, as GIMP is applicable to tissues with clear outlines.
Technically, accepted SVG elements are g, path, rect, ellipse, use, and title. Other elements will raise errors or warnings when using spatialHeatmap. The use elements are not allowed inside g. The g elements should not have transform attribute with a matrix value, which indicates relative coordinates. To remove the transform attribute, ungroup and regroup the respective g element.
Select 'Fill and Stroke...' under 'Object' tab on the top. On the right panel 'Fill and Stroke (Shift+Ctrl+F)', set 'Stroke style' 3.000 px and press 'Enter' key.
Press '+' key to zoom in and select a shape to start. Click at differencet corners of the shape to draw an outline. At last, click at the first corner to seal the outline.
If the new shape is filled with a color, click 'No paint' under the 'Fill' tab on the panel 'Fill and Stroke (Shift+Ctrl+F)'. Then a new sealed transparent shape is completed.
Select 'Edit paths by nodes (F2)' on the left tool bar, and draw a rectangle over the new shape. Select 'Make selected nodes corner' on the top.
Drag nodes and edges to align the new shape with template shape. On the fill and stroke panel, under 'Fill' tab, select 'Flat color' and adjust the color scales to label the new shape. Then the first shape is made successfully.
If the template shapes are similar to regular shapes such as rectangles, circles. The regular shapes can be used to make new shapes.
Select 'Create rectangles and squares (F4)' on the left, and draw a rectangle over a shape template. Convert this object to path by selecting 'Object to Path' under 'Path' tab on the top.
Click 'No paint' under the 'Fill' tab on the fill and stroke panel to make the rectangle transparent. Rotate the rectangle. Select 'Edit paths by nodes (F2)' on the left tool bar. If necessary, add a node by double-clicking on an edge. Drag nodes and edges to align the rectangle with the underlying shape template.
Select 'Edit paths by nodes (F2)' on the left tool bar, and draw a rectangle over the new shape. Select 'Make selected nodes corner' on the top.
Drag the handles at nodes to adjust edges for fine alignment with the template shape. On the fill and stroke panel select 'Flat color' under 'Fill' tab to color this new shape. Then the new shape is successfully made.
If shapes in the template PNG image have clear outlines, the SVG image can be extracted with GIMP, since unclear outlines would lead to messay SVG shapes.
Open the PNG template in GIMP, and open 'Paths' panel. Right click and select 'By Color'.
Now the shapes can be selected by colors. For exmaple, clicking on a whilte shape selects all shapes in white. Right click, select 'To Path', then all the white shapes are extracted to the 'Paths' panel. Similarly, extract the yellow shapes.
Click in front of each extracted shapes to show the 'eye' icon. Mouse over the extracted shapes, right click, select 'Merge Visible Paths'. After merged, export the paths as an SVG image (root_gimp.svg). Next, edit the exported SVG image in Inkscape.
The exported SVG image root_gimp.svg is accessible here (hover over the image, right click, and select 'Save image as…').
Open the exported SVG image in Inkscape. Under 'Object' tab at the top, select 'Fill and Stroke...'. First click the image, then select 'Flat color' under 'Fill' tab. Adjust color scales to fill the image.
All the paths in SVG image generated in GIMP are combined as a whole. In order to separate the paths, first click the image then click 'Break Apart' under the top 'Path' tab. Now the paths/shapes are separated, but the outlines of shapes are not stroked. Thus use 'Ctrl+A' to select all shapes, on the fill and stroke panel select 'Flat color' under 'Stroke paint' tab, and set a number under 'Stroke style' tab (e.g. 1.333 px).
Click the white area to unselect the whole image. Press '+' key to zoom in, try to move different shapes, and delete those unnecessary by pressing 'Delete' key.
Use 'Ctrl+A' to select all shapes. Click 'No paint' under 'Fill' tab on the fill and stroke panel. Click the white area to unselect the whole image. The blank SVG image is ready to format with the data, which is downloadable here.
If the SVG image comprises numerous shapes (left image below), the speed of creating SHMs will be compromised in spatialHeatmap. To maintain acceptable runtime, the strategy of overlaying a large base shape with structural shapes are recommended. Specifically, the base shape is the overal outline of many tissues, and the structural shapes are intersected shapes or lines to define the borders between individual tissues (right image below). This technique will save a large number of coordinates and thus reduce runtime significantly.
The bridges between data and aSVG are the samples/features. Only features having matching counterparts between data and aSVG are colored in the spatial heatmaps. Therefore, the formatting process of SVG image is in accordance with the data formatting. The accepted data classes include vector, data frame, or SummarizedExperiment (SE) (Morgan et al. 2018). Formatting the data is essentially to define samples and/or conditions. In the following, data formatting is explained on SE, since this data class is widely used in biological omics analysis. Details on vector and data frame are presented in Supplement.
SE applies to data involving many samples and conditions. The data matrix with rows and columns being genes and sample/conditions respectively is stored in the assay slot. The data formatting is essentially to make a targets file. It is a data frame and usually contains at least 2 columns defining replicates of samples and conditions respectively. The targets file is in the colData slot. In the rowData slot, a data frame of annotations for rows in assay is optionally added.
The example data GSE14502 is from GEO. It is a microarray analysis on Arabidopsis thaliana root/shoot tissues under control and hypoxia (Mustroph et al. 2009), and is downloaded through GEOquery (Davis and Meltzer 2007).
Access the GEO dataset GSE14502 and convert it to SummarizedExperiment. To avoid downloading the same data, the downloaded data is cached.
cache.pa <- '~/.cache/shm' # The path of cache.
gset <- read_cache(cache.pa, 'gset') # Retrieve data from cache.
if (is.null(gset)) { # Save downloaded data to cache if it is not cached.
gset <- getGEO("GSE14502", GSEMatrix=TRUE, getGPL=TRUE)[[1]]
save_cache(dir=cache.pa, overwrite=TRUE, gset)
}
se.arab <- as(gset, "SummarizedExperiment")
Use gene symbols to replace probes.
rownames(se.arab) <- make.names(rowData(se.arab)[, 'Gene.Symbol'])
A slice of the data matrix in assay slot.
assay(se.arab)[1:3, c(25:29, 36:39)]
## GSM362192 GSM362193 GSM362194 GSM362195 GSM362196 GSM362203 GSM362204
## ORF25 4.759222 5.108704 5.017523 5.183043 4.956723 4.799543 5.092339
## NAD4L 4.838742 4.964549 4.877558 5.210359 5.316448 4.980785 5.418198
## ArthMp059 8.365082 5.655345 5.580664 5.990833 7.483474 6.310126 7.499970
## GSM362205 GSM362206
## ORF25 4.823938 5.209876
## NAD4L 4.985976 5.376374
## ArthMp059 5.952638 6.255218
A slice of the experiment design, which is stored in colData slot.
colData(se.arab)[c(25:29, 36:39), 1:4]
## DataFrame with 9 rows and 4 columns
## title geo_accession status
## <character> <character> <character>
## GSM362192 root_control_pGL2_rep1 GSM362192 Public on Oct 12 2009
## GSM362193 root_control_pGL2_rep2 GSM362193 Public on Oct 12 2009
## GSM362194 root_control_pGL2_rep3 GSM362194 Public on Oct 12 2009
## GSM362195 root_hypoxia_pGL2_rep1 GSM362195 Public on Oct 12 2009
## GSM362196 root_hypoxia_pGL2_rep2 GSM362196 Public on Oct 12 2009
## GSM362203 root_control_pCO2_rep1 GSM362203 Public on Oct 12 2009
## GSM362204 root_control_pCO2_rep2 GSM362204 Public on Oct 12 2009
## GSM362205 root_hypoxia_pCO2_rep1 GSM362205 Public on Oct 12 2009
## GSM362206 root_hypoxia_pCO2_rep2 GSM362206 Public on Oct 12 2009
## submission_date
## <character>
## GSM362192 Jan 21 2009
## GSM362193 Jan 21 2009
## GSM362194 Jan 21 2009
## GSM362195 Jan 21 2009
## GSM362196 Jan 21 2009
## GSM362203 Jan 21 2009
## GSM362204 Jan 21 2009
## GSM362205 Jan 21 2009
## GSM362206 Jan 21 2009
The title column includes 'samples' and 'conditions', so it is used to make the targets file based on the following requirements.
It is a data frame and usually has at least one column of samples and one column of conditions. The rows correspond with columns in assay slot. If the condition column is not defined, the samples are assumped under same condition.
The sample column specifies sample replicates. It is crucial that replicate names of the same sample must be identical. Otherwise, they are treated as different samples. E.g. 'root_pGL2' and 'root_pCO2' in Table 1.
The sample identifiers of interest must be identical with features of interest in aSVG respectively. It means even a dot, undescore, space, etc can make a difference and lead to target features not colored in spatial heatmaps. Since double underscore (__) is a reserved separator in spatialHeatmap, it cannot be used in sample or condition identifiers.
The condition column has the same requirement with the sample column. E.g. 'control' and 'hypoxia' in Table 1.
The completed targets file is packaged in spatialHeatmap, and is also downloadable here (click the file, click "Raw", right click, and select "Save as…"). Selected rows are shown in Table 1.
Note, the data file name should not contain parenthesis. E.g. target_arab.txt is expected while target_arab(1).txt will cause errors.
tar.arab <- system.file('extdata/shinyApp/example/target_arab.txt', package='spatialHeatmap')
target.arab <- read.table(tar.arab, header=TRUE, row.names=1, sep='\t')
target.arab[c(25:29, 36:39), ]
| samples | conditions | |
|---|---|---|
| root_control_pGL2_rep1 | root_pGL2 | control |
| root_control_pGL2_rep2 | root_pGL2 | control |
| root_control_pGL2_rep3 | root_pGL2 | control |
| root_hypoxia_pGL2_rep1 | root_pGL2 | hypoxia |
| root_hypoxia_pGL2_rep2 | root_pGL2 | hypoxia |
| root_control_pCO2_rep1 | root_pCO2 | control |
| root_control_pCO2_rep2 | root_pCO2 | control |
| root_hypoxia_pCO2_rep1 | root_pCO2 | hypoxia |
| root_hypoxia_pCO2_rep2 | root_pCO2 | hypoxia |
Use the targets file to replace the data frame in colData slot.
As pre-proccesing conventions, gene expressoin profiling data should be normalized, aggregated, and filtered. The dataset GSE14502 is already normalised by RMA (Gautier et al. 2004), so the pro-processing only includes replicate aggregation and filtering.
The data is aggregated based on 'sample__condition' replicates internally, and a slice of the result is shown below.
se.aggr.arab <- aggr_rep(data=se.arab, sam.factor='samples', con.factor='conditions', aggr='mean')
assay(se.aggr.arab)[1:3, c(11:12, 16:17)]
## root_pGL2__control root_pGL2__hypoxia root_pCO2__control
## ORF25 4.961816 5.069883 4.945941
## NAD4L 4.893616 5.263403 5.199492
## ArthMp059 6.533697 6.737153 6.905048
## root_pCO2__hypoxia
## ORF25 5.016907
## NAD4L 5.181175
## ArthMp059 6.103928
Genes with expression values larger than 6 in at least 3% of all samples (pOA=c(0.03, 6)), and with coefficient of variance (CV) between 0.30 and 100 (CV=c(0.30, 100)) are retained.
# Filter genes with low variance and low intensity.
se.fil.arab <- filter_data(data=se.aggr.arab, sam.factor='samples', con.factor='conditions', pOA=c(0.03, 6), CV=c(0.30, 100), dir=NULL)
## All values before filtering:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.345 4.879 6.481 6.763 8.263 15.107
## All coefficient of variances (CVs) before filtering:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01047 0.03424 0.05347 0.07706 0.09526 0.54344
## All values after filtering:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.644 4.838 6.249 7.364 9.756 15.004
## All coefficient of variances (CVs) after filtering:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3008 0.3203 0.3385 0.3531 0.3735 0.5434
There are two aSVG formats that spatialHeatmap accepts: SHM format and EBI format. The former is specifically desinged for spatialHeatmap while the latter is from Expression Atlas Anatomogram. If the aSVGs are only used for making spatial heatmaps, the SHM format is recommended, since it is easier and time-saving.
A path represents a shape. If a tissue consists of multiple paths and is expected to be colored in the spatial heatmap, all its paths must be grouped as a whole (indicated by svg:g) rather than combined (Ctrl+K). A group should not include another group, which means all elements in a group should be single paths. However, if a multiple-path tissue is not expected to be colored in the spatial heatmap, there is no need to group them and the paths can keep random ids.
If a tissue is expected to be colored in the spatial heatmaps, its id value must have an identical tissue counterpart in the data/targets file. It means even a difference of dot, space, underscore, uppercase, or lowercase matters. If a tissue is a group, the group id counts while ids of inside paths are useless.
In the end, all the tissues (groups and single paths) must be grouped together as a container group, and this large group must be the last element in the 'XML Editor'.
In order to be painted with the same color in spatial heatmaps, tissues consisting of multiple shapes should be grouped (indicated by g) rather than combined (Ctrl+K). Take the root_pGL2 as example. Select shapes of this tissue by clicking their edges while pressing 'Shift' key. Mouse over any edge of selected shapes, right click and select 'Group'.
Click 'Flat color' under the 'Fill' tab on the fill and stroke panel, and fill the grouped shapes with a preferred color.
A group (e.g. root_pSCR) could be resized by simply dragging the black arrows on the edge or changing the width (H), height (H) values on the top toolbar. On either case, in the "XML Editor" panel the resized group would have an attribute of transform with a value of matrix, which means the coordinates of this group are transformed because of resizing.
The spatialHeatmap does not accept "tranform-matrix" pair in g element. To remove this pair, simply ungroup (Ctrl+Shift+G) and regroup (Ctrl+G) the relevant shapes.
Under the 'Edit' tab on the top, select 'XML Editor...'. Click the group to select it. On the 'XML Editor (Shift+Ctrl+X)' panel, first click the 'id' and type in root_pGL2 then click 'Set'. It is critial that the id 'root_pGL2' has identical tissue counterpart in the data. Otherwise, this tissue will not be colored. After that, the first group is done.
It is optional to add an 'ontology' attribute to the tissue. Retrieve the ontology id for 'root atrichoblast epidermis' (root_pGL2) at Ontology Lookup Service. Click the tissue in the XML Editor, type in 'ontology' and the retreived id in front of and below 'Set' respectively, and click 'Set', then the 'ontology' attribute is assigned.
Similarly, group and set id for root_pCO2, root_pSCR, root_pWOL respectively. When group the small vasculature shapes in the center, a shortcut is to draw a rectangle over them to select all rather than clicking each individually. Note if a tissue sample contains only one shape, there is no need to group it, but the id should be identical with corresponding sample in the data in order to be colored in spatial heatmaps.
Brown, blue, orange, purple labels root_pGL2, root_pCO2, root_pSCR, root_pWOL respectively. The blank shapes have random ids and will not be colored in the spatial heatmap.
At last, it is required to group all tissues (groups and independent paths) as a container group. To do so, use 'Ctrl+A' to select all, mouse over the selection, right click, and click group. Note it is optional to set a specific id for this container group.
It is crucial that this container group must be the last element on the 'XML Editor'.
Text and legend can be added to the SVG image, which is optional. To include text, first click "Create and edit text objects (F8)" on the left tool bar. Click the target position and type text, here "root_pGL2". The the text object is created.
To edit text, click "View and select font family, font size, and other text properties (Shift+Ctrl+T)" on the top tool bar. On the text panel, edit the text such as font size.
Select the text object and convert it to path.
Now the text is a group (g). If the text group is not inside the container group, click "Indent node" on the XML editor to move it into the container.
The text group id must have specific prefix and suffix so as to retain desired output in final SHMs. Specifically, the prefix should be text, otherwise the letters in SHMs might be distorted. The suffix should be _localLGD or _globalLGD, otherwise the text color (black) will be missing and the text will be transparent. If _localLGD, the text will only be present in the legend image (Figure 1 right) while if _globalLGD the text will be present in the main SHMs (Figure 1 middle) and legend image. An example text group id is text927_globalLGD.
If non-text legends are included, the path ids only need to have the suffix _localLGD or _globalLGD such as rect161_globalLGD. Without the suffix, the legend will be transparent in SHMs. In addition, the legend paths should also be included in the large container group.
All the paths need to use absolute positions. To do so, click 'Preferences...' under 'Edit' tab. Go to 'Input/Output' → 'SVG Output' → 'Path data' → 'Path string format', and select 'Absolute'.
This setting only affects newly created paths. To trigger the effcts on existing paths, click 'Select All in All Layers' under the 'Edit' tab, and use the arrow key to nudge the selection, e.g. one step forward and one step backward. Then all the paths are rewritten and absolutely positioned. The absolut position is indicated by the capital letters (M, C, etc.) in the 'd' attribute of a path.
To ensure correct height and width, go to 'Document Properties...' under 'File ' tab, and select 'Custom size' under 'Page' tab, then set 'Units' to 'px' and click on 'Resize page to drawing or selection'.
Save the aSVG file as 'arabidopsis.thaliana_root.cross_shm.svg' with the naming scheme '<species>_[view]_shm.svg', which is downloadable here. Note: the aSVG file name ends with '.svg' and the front should only consist of letters, digits, dots, or underscores, and parenthesis should be avoided. E.g. 'arabidopsis.thaliana_root.cross_shm(1).svg' will throw out errors.
By now the aSVG file is done.
The aSVG file 'arabidopsis.thaliana_root.cross_shm.svg' generated above is ready to use for plotting spatial heatmaps. It is packaged in spatialHeatmap and is accessed below.
svg.root <- system.file('extdata/shinyApp/example/arabidopsis.thaliana_root.cross_shm.svg', package='spatialHeatmap')
Plot spatial heatmaps on gene HRE2. In Figure 1, it is manifest that gene HRE2 is showing higher expression level in hypoxia than in control, and thus might play an interesting role in hypoxia resistance.
spatial_hm(svg.path=svg.root, data=se.fil.arab, sam.factor='samples', con.factor='conditions', ID=c("HRE2"), legend.nrow=4, bar.width=0.1, legend.r=1)
## Coordinates: arabidopsis.thaliana_root.cross_shm.svg ...
## CPU cores: 1
##
## Potential error detected in these elements: 'root_pWOL;root_pWOL'! If they are groups, please remove the 'transform' attribute with a 'matrix' value by ungrouping and regrouping the respective groups in Inkscape. If individual paths, consider deleting them in Inkscape. Otherwise, colors in spatial heatmap might be shifted!
## Features in data not mapped: root_total, root_p35S, root_pSHR, root_pSUC2, root_pSultr2.2, root_pPEP, root_pRPL11C, shoot_total, shoot_p35S, shoot_pGL2, shoot_pRBCS, shoot_pSUC2, shoot_pSultr2.2, shoot_pCER5, shoot_pKAT1
## ggplots/grobs: arabidopsis.thaliana_root.cross_shm.svg ...
## ggplot: HRE2, control hypoxia
## Legend plot ...
## CPU cores: 1
## Converting "ggplot" to "grob" ...
## HRE2_control_1 HRE2_hypoxia_1
Figure 1 Root Spatial Heatmaps. The matching tissues between data and aSVG are colored in the middle spatial heatmaps. On the right is the legend plot, where matching tissues are labeled.
If errors or warnings arose when using spatialHeatmap, refer to the aSVG technical requirements and aSVG formatting requirements, by which most issues could be resolved.
This section presents details of data formatting on vector and data.frame and the EBI aSVG format.
The numceric data used to color the features in aSVG images can be provided as three different object types including vector, data.frame, and SummerizedExperiment (SE). When working with complex omics-based assay data then the latter provides the most flexibility, and thus should be the preferred container class for managing numeric data in spatialHeatmap. Both data.frame and SE can hold data from many measured items, such as many genes or proteins. In contrast to this, the vector class is only suitable for data from single items. Due to its simplicity this less complex container is often useful for testing or when dealing with simple data sets.
In data assayed only at spatial dimension, there are two factors samples and conditions, while data assayed at spatial and temporal dimension contains an additional factor time points or development stages. This tutorial only covers the spatial data. To view the usage of spatiotemporal data, see the package vignette.
The data class vector applies to several numeric values measured for a single item (e.g. gene). If one or more conditions are provided, the samples and conditions should be connected by double undescore, i.e. in the form of 'sample__condition'. Since '__' is a reserved separator, the naming scheme of 'sample' and 'condition' should not use it. If no conditions are provided, all the samples are assumed to have same condition.
Take the samples and conditions in Table 1 for example. The two samples are 'root_pGL2' and 'root_pCO2' and two conditions are 'control' and 'hypoxia'. Assume the two samples have matching counterparts in the aSVG. Since there are two conditions for each sample, the vector should contain four target values. The following code generates five random values so that the first four are the target values while the last one is from a third assumed sample that has no counterparts in the aSVG.
# Random numeric values.
vec <- sample(x=1:100, size=5)
Name the first 4 values with the scheme 'sample__condition', and last with a random name notMapped. Note each value has a unique name.
# Give unique names to random values.
names(vec) <- c('root_pGL2__control', 'root_pGL2__hypoxia', 'root_pCO2__control', 'root_pCO2__hypoxia', 'notMapped')
vec
## root_pGL2__control root_pGL2__hypoxia root_pCO2__control root_pCO2__hypoxia
## 43 30 31 29
## notMapped
## 94
The class data frame applies to more items (e.g. genes) assayed in several samples and/or conditions (e.g. 2 samples under 2 conditions). Columns and rows are samples/conditions and assayed items respectively. Similarly, if one or more conditions are provided, the column names should follow the scheme 'sample__conditio'. If no conditions are provided, all the samples are assumed to have same condition.
Take the same samples and conditions in the vector case as example. Make a numeric data frame of 20 rows and 5 columns.
# Make a numeric data frame.
df.test <- data.frame(matrix(sample(x=1:1000, size=100), nrow=20))
Name columns with the names in above vector and rows with 20 genes (gene1, gene2, ..., gene20).
# Name the columns.
colnames(df.test) <- names(vec)
# Name the rows.
rownames(df.test) <- paste0('gene', 1:20)
# A slice of the data frame.
df.test[1:3, ]
## root_pGL2__control root_pGL2__hypoxia root_pCO2__control
## gene1 956 225 855
## gene2 896 275 838
## gene3 733 302 664
## root_pCO2__hypoxia notMapped
## gene1 417 970
## gene2 268 708
## gene3 68 706
In the downstream interactive network (refer to the spatialHeatmap vignette), if users want to have a gene annotation by mousing over a node, a column of gene annotation can be appended to the data frame. For example, the 20 genes are annotated as ann1, ann2, ..., ann20.
df.test$ann <- paste0('ann', 1:20)
df.test[1:3, ]
## root_pGL2__control root_pGL2__hypoxia root_pCO2__control
## gene1 956 225 855
## gene2 896 275 838
## gene3 733 302 664
## root_pCO2__hypoxia notMapped ann
## gene1 417 970 ann1
## gene2 268 708 ann2
## gene3 68 706 ann3
Note, if the data frame is imported from a file, the file name should not contain parenthesis. E.g. target_arab.txt is expected while target_arab(1).txt will cause errors.
The spatialHeatmap also works on EBI aSVG format, which is used in Expression Atlas Anatomogram, and the specific requirements are listed at the bottom of this page. The following sections illustrate the transition of aSVGs from the SHM format to EBI format, which will be helpful if one wants to create aSVGs working with both spatialHeatmap and Expression Atlas Anatomogram.
Download the EBI SVG template EBI_template.svg, which is made of the Expression Atlas Anatomogram and open it in Inkscape. It contains 2 layers LAYER_OUTLINE and LAYER_EFO. The former is expected to store organism outline shapes while the latter to store tissue shapes. As a template, both layers are empty except that LAYER_OUTLINE contains a green icon, which links to Expression Atlas Licence.
Note in EBI format, though start with svg:g, the outline (LAYER_OUTLINE) and tissue (LAYER_EFO) elements must be layers rather than groups, which is indicated by the groupmode attribute with the layer value.
Copy height, width, viewBox values in top <svg ...> element from root_cross_simple.svg to EBI_template.svg respectively.
The green icon might be shrunk. To resize the icon, select it, click the lock on the top toolbar, which maintains the aspect ratio, and increase the height (H) or width (W). Move the icon to the bottom left corner.
In root_cross_simple.svg, ungroup (Ctrl+Shift+G) the container group, select all shapes, and copy all.
In EBI_template.svg, open layer panel (Ctrl+Shift+L) and make sure the 'unlock' icon is in front of 'EFO' and 'Outline'. Click the 'EFO' layer, and paste all tissue shapes from root_cross_simple.svg. Then these shapes are inside the 'EFO' layer.
The tissues in 'EFO' layer should be annotated, which includes ontology ids and tissue identifiers. Take the root_pSCR (root endodermis) tissue for example. Click the tissue in 'XML Editor', and click 'New element node' on the top. Type in 'svg:title' and click 'Create'. Then a title node is created at the bottom of root_pSCR.
Click the title node and click 'New text node' at the top. Then an empty text node is created inside the title node. Click the text node and type in 'root_pSCR' on the right.
Set title id as root_pSCR, then the title is done.
Look up for 'root endodermis' that root_pSCR stands for in Ontology Lookup Service, and set the root_pSCR group id as PO:0005059. Then the annotation of root_pSCR is done.
Annotate other tissues in the same way.
If there are outline shapes, they should be placed in the 'LAYER_OUTLINE' layer. For illustration purpose, a root outline is created.
Click 'Outline' in the layer panel, and draw an outline as explained above. Then the outline shape is created in the 'Outline' layer.
Make sure all paths have absolute coordinates as shown in Section Absolute Path Position.
Next, make sure the canvas are resized as shown in Section Resize Canvas.
According to EBI guidlines, all tissue shapes ('EFO' layer) are expected to have style="fill:none; stroke:none". To do so, click on the 'EFO' in layer panel, select all tissue shapes (Ctrl+A), then click 'No paint' under both 'Fill' and 'Stroke paint' in 'XML Editor' panel. Then all the tissue shapes are transparent.
The EBI aSVG follows a different naming scheme '<species>[.view].svg', so this image can be saved as 'arabidopsis_thaliana.root_cross.svg', which is downloadable here and ready to use in section Spatial Heatmap.
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] GEOquery_2.60.0 SummarizedExperiment_1.22.0
## [3] Biobase_2.52.0 GenomicRanges_1.44.0
## [5] GenomeInfoDb_1.28.0 IRanges_2.26.0
## [7] S4Vectors_0.30.0 BiocGenerics_0.38.0
## [9] MatrixGenerics_1.4.0 matrixStats_0.59.0
## [11] spatialHeatmap_1.1.6 knitr_1.33
## [13] nvimcom_0.9-25
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.1 shinydashboard_0.7.1
## [3] tidyselect_1.1.1 RSQLite_2.2.7
## [5] AnnotationDbi_1.54.0 htmlwidgets_1.5.3
## [7] grid_4.1.0 BiocParallel_1.26.0
## [9] munsell_0.5.0 ScaledMatrix_1.0.0
## [11] codetools_0.2-18 preprocessCore_1.54.0
## [13] av_0.6.0 withr_2.4.2
## [15] colorspace_2.0-1 filelock_1.0.2
## [17] highr_0.9 rstudioapi_0.13
## [19] SingleCellExperiment_1.14.1 labeling_0.4.2
## [21] GenomeInfoDbData_1.2.6 farver_2.1.0
## [23] bit64_4.0.5 distinct_1.4.0
## [25] rhdf5_2.36.0 vctrs_0.3.8
## [27] generics_0.1.0 rols_2.20.0
## [29] xfun_0.23 BiocFileCache_2.0.0
## [31] fastcluster_1.2.3 R6_2.5.0
## [33] doParallel_1.0.16 ggbeeswarm_0.6.0
## [35] rsvd_1.0.5 locfit_1.5-9.4
## [37] rsvg_2.1.2 bitops_1.0-7
## [39] rhdf5filters_1.4.0 cachem_1.0.5
## [41] gridGraphics_0.5-1 DelayedArray_0.18.0
## [43] assertthat_0.2.1 promises_1.2.0.1
## [45] scales_1.1.1 nnet_7.3-15
## [47] beeswarm_0.4.0 gtable_0.3.0
## [49] beachmat_2.8.0 WGCNA_1.70-3
## [51] rlang_0.4.11 genefilter_1.74.0
## [53] splines_4.1.0 lazyeval_0.2.2
## [55] impute_1.66.0 checkmate_2.0.0
## [57] BiocManager_1.30.15 yaml_2.2.1
## [59] reshape2_1.4.4 backports_1.2.1
## [61] httpuv_1.6.1 Hmisc_4.5-0
## [63] tools_4.1.0 ggplotify_0.0.7
## [65] ggplot2_3.3.3 ellipsis_0.3.2
## [67] gplots_3.1.1 jquerylib_0.1.4
## [69] RColorBrewer_1.1-2 ggdendro_0.1.22
## [71] dynamicTreeCut_1.63-1 Rcpp_1.0.6
## [73] plyr_1.8.6 base64enc_0.1-3
## [75] sparseMatrixStats_1.4.0 visNetwork_2.0.9
## [77] progress_1.2.2 zlibbioc_1.38.0
## [79] purrr_0.3.4 RCurl_1.98-1.3
## [81] prettyunits_1.1.1 rpart_4.1-15
## [83] viridis_0.6.1 cluster_2.1.1
## [85] magrittr_2.0.1 data.table_1.14.0
## [87] grImport_0.9-3 hms_1.1.0
## [89] mime_0.10 evaluate_0.14
## [91] xtable_1.8-4 XML_3.99-0.6
## [93] jpeg_0.1-8.1 gridExtra_2.3
## [95] compiler_4.1.0 scater_1.20.0
## [97] tibble_3.1.2 KernSmooth_2.23-18
## [99] crayon_1.4.1 htmltools_0.5.1.1
## [101] later_1.2.0 Formula_1.2-4
## [103] tidyr_1.1.3 geneplotter_1.70.0
## [105] DBI_1.1.1 dbplyr_2.1.1
## [107] MASS_7.3-53.1 rappdirs_0.3.3
## [109] readr_1.4.0 Matrix_1.3-2
## [111] igraph_1.2.6 pkgconfig_2.0.3
## [113] flashClust_1.01-2 rvcheck_0.1.8
## [115] foreign_0.8-81 plotly_4.9.3
## [117] scuttle_1.2.0 xml2_1.3.2
## [119] foreach_1.5.1 annotate_1.70.0
## [121] vipor_0.4.5 bslib_0.2.5.1
## [123] rngtools_1.5 XVector_0.32.0
## [125] doRNG_1.8.2 stringr_1.4.0
## [127] digest_0.6.27 Biostrings_2.60.1
## [129] rmarkdown_2.8 htmlTable_2.2.1
## [131] edgeR_3.34.0 DelayedMatrixStats_1.14.0
## [133] curl_4.3.1 shiny_1.6.0
## [135] gtools_3.9.2 lifecycle_1.0.0
## [137] jsonlite_1.7.2 Rhdf5lib_1.14.0
## [139] BiocNeighbors_1.10.0 viridisLite_0.4.0
## [141] limma_3.48.0 fansi_0.5.0
## [143] pillar_1.6.1 lattice_0.20-41
## [145] KEGGREST_1.32.0 fastmap_1.1.0
## [147] httr_1.4.2 survival_3.2-10
## [149] GO.db_3.13.0 glue_1.4.2
## [151] UpSetR_1.4.0 png_0.1-7
## [153] iterators_1.0.13 bit_4.0.4
## [155] stringi_1.6.2 sass_0.4.0
## [157] HDF5Array_1.20.0 blob_1.2.1
## [159] DESeq2_1.32.0 BiocSingular_1.8.0
## [161] latticeExtra_0.6-29 caTools_1.18.2
## [163] memoise_2.0.0 dplyr_1.0.6
## [165] irlba_2.3.3
Davis, Sean, and Paul Meltzer. 2007. “GEOquery: A Bridge Between the Gene Expression Omnibus (GEO) and BioConductor.” Bioinformatics 14: 1846–7.
Gautier, Laurent, Leslie Cope, Benjamin M. Bolstad, and Rafael A. Irizarry. 2004. “Affy—analysis of Affymetrix GeneChip Data at the Probe Level.” Bioinformatics 20 (3). Oxford, UK: Oxford University Press: 307–15. doi:10.1093/bioinformatics/btg405.
Maag, Jesper L V. 2018. “Gganatogram: An R Package for Modular Visualisation of Anatograms and Tissues Based on Ggplot2.” F1000Res. 7 (September): 1576.
Morgan, Martin, Valerie Obenchain, Jim Hester, and Hervé Pagès. 2018. SummarizedExperiment: SummarizedExperiment Container.
Muschelli, John, Elizabeth Sweeney, and Ciprian Crainiceanu. 2014. “BrainR: Interactive 3 and 4D Images of High Resolution Neuroimage Data.” R J. 6 (1): 41–48.
Mustroph, Angelika, M Eugenia Zanetti, Charles J H Jang, Hans E Holtan, Peter P Repetti, David W Galbraith, Thomas Girke, and Julia Bailey-Serres. 2009. “Profiling Translatomes of Discrete Cell Populations Resolves Altered Cellular Priorities During Hypoxia in Arabidopsis.” Proc Natl Acad Sci U S A 106 (44): 18843–8.
Waese, Jamie, Jim Fan, Asher Pasha, Hans Yu, Geoffrey Fucile, Ruian Shi, Matthew Cumming, et al. 2017. “EPlant: Visualizing and Exploring Multiple Levels of Data for Hypothesis Generation in Plant Biology.” Plant Cell 29 (8): 1806–21.
Winter, Debbie, Ben Vinegar, Hardeep Nahal, Ron Ammar, Greg V Wilson, and Nicholas J Provart. 2007. “An ‘Electronic Fluorescent Pictograph’ Browser for Exploring and Analyzing Large-Scale Biological Data Sets.” PLoS One 2 (8): e718.